CT 2.3.2 Bi-clique Communities
نویسنده
چکیده
We present a method for detecting communities in bipartite networks. Based on an extension of the k-clique community detection algorithm suggested by Palla et al. [1], we demonstrate how modular structure in bipartite networks presents itself as overlapping bicliques. The talk is based work discussed in Ref [2]. Bipartite networks are ubiquitous A bipartite network is a network whose nodes is naturally divided into two disjoint sets such that each link connects a node in one set to a node in the other set. The raw data structure behind many well-known unipartite networks is bipartite. Social networks, such as the Internet Movie Database (IMDb), the scientific collaboration network in condensed matter physics from www.arxiv.org (cond-mat), and many artistic collaboration networks are all bipartite. In biology, the metabolic network is naturally bipartite and so are many other biological networks simply because DNA rarely interacts directly with DNA. Within the world of information networks, the bipartite structure is also ubiquitous, linking documents (web pages, emails, dictionary entries) to their content. When we regard the unipartite projection of these networks, important structural information is neglected. Community definition In analogy with the unipartite case, the basic observation upon which our community definition relies is that a typical community consists of several complete sub-bigraphs that tend to share many of their nodes. A Ka,b biclique is a complete sub-bigraph with a nodes in the upper (black) node set and b nodes in the lower (blue) node set in Figure 2. A Ka,b biclique can be identical to a maximal complete sub-bigraph or it can exist on a subset of the nodes of a maximal complete sub-bigraph. Generalizing from [1], we define a Ka,b biclique community, as the union of all Ka,b bicliques that can be reached from each other through a series of adjacent Ka,b bicliques. We define two Ka,b bicliques to be adjacent if their overlap is at least a Ka-1,b-1-biclique. Another way of saying this is that the two cliques must share at least a-1 upper vertices and b-1 lower vertices. See Figure 3. The precise relationship to the k-cliques is well understood. We are able to demonstrate that, for many sparse networks, the complexity of the algoritm is O(M 2 ), where M is the number of edges in the network. An important feature of the biclique community approach is that this method provides an immediate context to the communities that are detected. In the movie-actor network, a list of actors is always accompanied by a list of films. This context makes it immediately clear why actors in a group belong together. In the metabolic network a list of metabolites is accompanied a list of the reactions they participate in, etc. The presence of context is an important help in determining the function of detected communities. In this sense, the bicommunity information is more valuable that the one obtained by finding structure in the two unipartite projections because it provides specific links between the communities that are present in the two node sets. Networks of communities Each partition Ka,b results in a network of communities, where each community is a node (size proportional to community size) that is linked to other communities via node overlap, thus there are two kinds of links (red for overlapping upper nodes; blue for overlapping lower nodes). The networks-of-communities in Figure 4 can be used to explain how bicliques are used analyze highly different network properties, see figure text. Figure 4. Condensed matter physics; networks of communities. (a) Shows the one-node network for K1,1. The node displays the ratio of authors (red) to papers (blue) in the bipartite network. (b) A mostly blue, paper-dominated network emerges when we choose a small and b large. In other words, if we want to analyze groups of longtime collaborators, we choose cliques with few authors and many papers. Here, each community contains only a few authors and many papers, while the overlap with other communities of other longtime collaborators will mainly be papers (mainly blue links). The largest community in Figure 4 (b) has 12 authors and 290 papers, but such a large collaboration is the exception rather than the rule; most communities contain longstanding theoretical collaborations amongst 2-4 authors who have written between 20 and 60 papers together. (c) A balanced network. In the middle interval, when a is around the same size as b, we find balanced groups of medium size that overlap each other both with papers and authors (both red and blue links). (d) A mostly red, author dominated network. If we wish to search for large collaborations, we choose cliques of many authors and fewer papers: This allows us to find communities of large (typically experimental) collaborations; in this case the communities contain many authors and few papers, while the node-overlap with other communities consists of authors (mainly red links). The considerations above are specific to the cond-mat network, but a similar analysis can be performed on any bipartite network. We have developed a software tool for visualization and analysis of bipartite networks which may be freely downloaded [3].
منابع مشابه
Finding overlapping communities in multiplex networks
We define an approach to identify overlapping communities in multiplex networks, extending the popular clique percolation method for simple graphs. The extension requires to rethink the basic concepts on which the clique percolation algorithm is based, including cliques and clique adjacency, to allow the presence of multiple types of edges.
متن کاملClique Graphs and Overlapping Communities
It is shown how to construct a clique graph in which properties of cliques of a fixed order in a given graph are represented by vertices in a weighted graph. Various definitions and motivations for these weights are given. The detection of communities or clusters is used to illustrate how a clique graph may be exploited. In particular a benchmark network is shown where clique graphs find the ov...
متن کاملGraphene Quantum Dots-Coated Bismuth Nanoparticles for X-ray CT Imaging-Guided Photothermal therapy of Cancer Cells
Introduction: Theranostic nanoparticles, which integrate both diagnostic and therapeutic capabilities into one nanoagent, has great promise to ablate more effective tumoral tissue by optimizing and real-time monitoring treatment interventions, as well as monitoring therapeutic response to corresponding effect. Multifunctional theranostic nanoagent based on graphene quantum dots...
متن کاملMarkov Dynamics as a Zooming Lens for Multiscale Community Detection: Non Clique-Like Communities and the Field-of-View Limit
In recent years, there has been a surge of interest in community detection algorithms for complex networks. A variety of computational heuristics, some with a long history, have been proposed for the identification of communities or, alternatively, of good graph partitions. In most cases, the algorithms maximize a particular objective function, thereby finding the 'right' split into communities...
متن کاملClique problem hard to approximate : PCP approach
When are two views inconsistent? First consider the case when the random strings used in these views are different, i.e. we have (R, b1, b2, ..., bq) and (R′, b1, b ′ 2, ..., b ′ q). These views are inconsistent, if for some i, j the bits bi and bj refer to the same location of the proof π, but bi 6= bj . Next consider the case when the random strings used in these views are the same, i.e. we h...
متن کامل